home *** CD-ROM | disk | FTP | other *** search
- _____________________________ Subj: VGA Writes _____________________________
-
- Fm: Activision/Infocom 76004,2122 # 193075
- To: Dan R Corritore 70243,1110 (X) Date: 29-Jul-92 13:24:20
-
- Here's a question. If I'm writing bytes/words (depending on the card) from
- system ram to a VGA mode 13H display, and I interleave some processing
- between byte/word writes, will the VGA hardare and the AT-BUS still wait
- state me? In other words is:
-
- MOVE VID,AX
- JSR SOMEWHERE
- MOVE VID,AX
- JSR SOMEWHERE
- etc...
-
- Faster than...
-
- MOVE VID,AX
- MOVE VID,AX
- etc.
- JSR SOMEWHERE
- JSR SOMEWHERE
-
- Thanks,
-
- William Volk
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 193218
- To: Activision/Infocom 76004,2122 Date: 29-Jul-92 19:20:06
-
- For a 386 processor (I think also for a 286), there are independant bus and
- execution units that are able to overlap execution to some extent. This
- means that external cycles should not effect instructions that are
- pre-fetched and ready for execution. I'm not sure to what extent the internal
- cache of the 486 assists here. So the answer to your question is probably
- yes, although I would suggest using this feature to do work on internal
- registers rather than CALL subroutines.
-
- I suggest you write a test program to determine how much the improvement is.
- About the best way to speed performance when writing words to VGA is to
- ensure that the target address is on an even boundary. A simple bit of code
- should explain this (part stolen from BC++ memcpy) -
- ;
- ; CX = pixel count (!= 0), DS:SI-> source bitmap, ES:DI->VGA RAM
- ;
- test di,1 ;odd address ???
- je short even ;no, begin with even address
- movsb ;make it even and move a pixel
- dec cx ;fix pixel count (if CX was zero on entry we're in trouble)
- even:shr cx,1 ;convert to words and set carry if odd number of bytes
- rep movsw ;copy stuff (doesn't effect carry) aligned so 1 mem cycle
- ;every time for a 16-bit VGA
- adc cx,cx ;inc CX (set it to 1) if carry set (ends on odd address)
- rep movsb ;NOTE: doesn't do move if CX is zero
-
- The seperate bus I/F overlap and pre-fetch will help absorb the extra
- instructions. Hope that helps.
-
- Peter.
- ...........................................................................
-
- Fm: John W. Ratcliff 70253,3237 # 270346
- To: Serge Mathieu 71035,2771 (X) Date: 30-Dec-92 14:11:11
-
- Serge,
-
- About the VGA screen copy stuff.
-
- I do a REP COMPSD, to do a double word compare to find all double words that
- are the same. Then I do a REPNZ CMPSD to find the number of double words
- that are different. Then back off the pointers, and move only the double
- words that are changed to screen ram, and your system ram copy of screen ram.
- Then back up to the top of the loop. I know it sounds crazy but VRAM is so
- slow that this turns out to be anywhere from a lot faster to many, many,
- times faster on really slow VGA cards. A REP COMPSW works just as well, but
- use those 32 bit ops whenever you can. The disadvantage to this method is
- you use up more of your precious system ram. The advantages are huge, and I
- hope obvious.
-
- I haven't really thought about this aproache for a panning/scrolling type
- environment. I think more from the simulation standpoint where you a
- re-rendering an entire screen, at extremely high frame rate, and most of the
- pixels are the same color from frame to frame, just by the nature of you
- rendering system. If most of the pixels are changing, then this method
- sucks. My wireframe demo points this out quite nicely.
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 283754
- To: Serge Mathieu 71035,2771 Date: 22-Jan-93 19:49:11
-
- I don't know if you got a solid reply yet but on all my systems, when
- blitting to video memory, writing words to odd addresses will slow the blit
- down by as much as 50%, depending on the video card.
-
- In normal RAM, odd word writes will cause a penalty of 33%, regardless of
- cache or processor speed. This is because instead of accessing the MMU
- (memory management unit) and writing once, the instruction has to write,
- access the MMU, and write again.
- ...........................................................................
-
- Fm: John Dlugosz [ViewPoint] 70007,4657 # 341935
- To: VOR Technologies Inc 71333,134 (X) Date: 26-Apr-93 08:38:36
-
- A word-aligned MOVSD does not save that much over MOVSW's. The video bus is
- the bottleneck and it still copies the same number of bytes.
-
- I used to know how may bus cycles it took for different kinds of transfers,
- but I've forgotten. The numbers I just got empircially indicate 5 or 6 bus
- cycles per word. That seems high. But it depends on the cards buffer for
- recieving data before actually storing it in its own RAM, and how fast that
- can get processed depends on the card's timing speed (ram will be buisy
- displaying and can't be stored to) as well as how the card was made. A read
- cycle, I recall, takes 850 to 1050 ns all told (but reads can't use the
- buffer).
-
- A single function call will take as much as 20 machine clocks (on a '386),
- which is _nothing_ compared to the time of copying the memory to video. About
- 5 pixels worth!
-
- --John
-
- ______________________ Subj: Local Variables On Stack ______________________
-
- Fm: Hans Peter Rushworth 100031,473 # 261886
- To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 14-Dec-92 22:26:11
-
- On the subject of tweaks, have you tried aligning all your local variables so
- the double words and words are all aligned? You can do this by ordering your
- locals so the dwords, words and bytes are all grouped together, then clearing
- the lower two bits of the BP, (you also have to ensure that there is a dummy
- dword variable at the bottom of the stack frame to cater for the potential
- "drop" of the BP. You also have to copy the parameters (if any) to the locals
- area.
-
- I think this is potentially worthwhile for those functions that exist for a
- longish period of time, and where you "run out" of registers.
-
- Peter.
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 262001
- To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 03:02:28
-
- >> What's the effect of clearing the low 2-bits of BP? You're just
- subtracting a max of decimal 2 from the value, correct?
-
- actually a max of 3 <g>. The reason behind this is that it makes the SS:BP
- point at long word address boundary. This means that if (say) you were to
- execute the instruction:
-
- LES SI, DWORD PTR [BP-4] ; (for sake of argument) reads 4 bytes of data
-
- then a 32 bit wide data bus CPU only needs to do 1 memory bus cycle to read
- or write the 4 bytes of data. For this to be effective, you would organise
- your local variables so that all the dwords are at the top of the stack
- frame, all the words are under that, and finally all the bytes under that.
- This then guarantees that when you access a local variable, the minimum
- number of memory cycles are needed. Sometimes when the function is called the
- stack will be correctly aligned, and this makes no difference, but other
- times it may not.
-
- Peter.
- ...........................................................................
-
- Fm: Mark Betz/Ass't SysOp 76605,2346 # 262161
- To: Hans Peter Rushworth 100031,473 (X) Date: 15-Dec-92 13:42:06
-
- Right, 3. That's two bits worth, correct? <g>. Let me make sure I understand
- this: the idea is that the stack will be dword alligned for 32-bit accesses,
- and that it won't make any difference to byte-wide or word-wide accesses. One
- thing still confuses me (p'raps more than one). Let's say that you have a
- stack frame that looks like this on entry to a function:
-
- dword <- BP + 14
- dword <- BP + 10
- word <- BP + 8
- word <- BP + 6
- byte <- BP + 4
- byte <- BP + 2
- word <- saved BP
-
- Suppose that BP == 11CA. Clear the lower two bits and you have 11C8. Now BP
- points to the word right below the saved BP. Do you simply add in an offset
- in order to correctly address the stack values now? MOV AH [BP+4] gets the
- first byte parameter from the stack, instead of MOV AH [BP+2]. You basically
- have to add 2 to all of your offsets. Is that how you'd handle it?
-
- --Mark
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 262239
- To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 15-Dec-92 16:10:10
-
- The precise location of the locals [BP-n] doesn't matter naturally, but the
- function parameters cannot be accessed using [BP+n], so you need to copy them
- into the local data area, and use the local copies. Just copy the BP into
- BX (for example) before masking it, and then use BX to move the parameters.
-
- Two other things:
-
- (1) you have to allocate an extra (dummy) long word local at the bottom of
- the local stack frame, so that if BP is decremented by 3 accessing the
- bottom local won't blow the stack.
-
- (2) You need to save the original (unmasked) BP so that you can copy this
- value back into SP at the end of the function, (normally you copy BP->SP),
- instead you just load SP from this saved value (which could be another
- local). Example:
-
- push bp ;save stack frame
- mov bp,sp ;new stack frame
- mov bx,bp ;keep copy of original frame
- and bp,0FFFCh ;align BP
- sub sp,FRAMESIZE ;allocate space
- push di ;save register variables
- push si ;
- mov [bp-SAVEBP],bx ;save original BP
- mov ax,ss:[bx+6] ;get arg1
- mov [bp-param1],ax ;copy it ... same for other args
- --- rest of function ----
- pop si ;restore register vars
- pop di ;
- mov sp,[bp-SAVEBP] ;original frame
- pop bp ;callers bp
- retf ;exit
-
- Peter.
-
- _____________________ Subj: Function Arguments on Stack _____________________
-
- Fm: Hans Peter Rushworth 100031,473 # 261555
- To: Jesse 76646,3302 (X) Date: 14-Dec-92 14:10:26
-
- >> what IS pushed by a function automatically?
-
- Normal function:
-
- High memory
-
- argn <-- calling function pushes arguments in reverse order
- arg2 eg f( arg1, arg2, ..., argn )
- arg1
- <return address> <--- the CALL instruction pushes the IP or CS:IP (model
- dependant) of the next instruction of the calling func
- _______________________ENTERS NEW FUNCTION_________________________________
- BP <--- The called function saves old the Base pointer
- and moves this stack address into the new BP
- <local variables><--- The stack pointer is adjusted to make space
- for the functions local (automatic) variables
- DI <--- The called function pushes register variables
- SP: SI (I may have the order wrong here)
- <rest of stack> <--- used for temporary pushes and pops, other function
- calls and interrupts.
- Low memory
-
- Cleanup on exit: first SI and DI are popped, then BP is copied back into
- SP, SP now points at the calling functions BP on the stack. BP is popped
- and a RET is performed, returning to the calling function. The function
- will usually do an ADD SP,n to "remove" the arguments it placed on the stack.
- The function return value is placed in AX or DX:AX depending on the size.
- The called function accesses variables using BP, positive offsets are used
- for the function actual parameters, and negative ones for the locals.
-
- When an interrupt occurs (hardware or software) the following is pushed on
- the stack by the CPU
-
- FLAGS register <-- after the push the interrupt mask is set.
- CS
- IP
- AX
- BX
- CX
- DX
- ES
- DS
- SI <--- Register varaibles are automatically saved
- DI neither function explicitly saves or restores them
- BP <--- The SP at this address is copied into the BP below
- ____________________________________ Enters interrupt handler_______
- <local variables> <--- The function does a SUB SP,n to allocate space
- for local variables, and sets up DS to point to
- the handlers data segment.
- SP:
- <rest of stack> <--- used for push pop etc
-
- Cleanup: The BP is copied back into the SP, and a IRET instruction is
- executed, which reloads all the registers. (the function may modify the
- registers on the stack to return values if this is a software interrupt,
- but for hardware interrupts the registers on the stack must be READ ONLY).
-
- I hope that is more or less a correct description.
-
- Peter.
-
- BTW, did you realise that the LOOP instruction on a 386/486 is actual SLOWER
- than the equivalent seperate decrement and branch instructions?
- ...........................................................................
-
- Fm: Mark Betz/Ass't SysOp 76605,2346 # 266171
- To: Jesse 76646,3302 (X) Date: 22-Dec-92 11:35:30
-
- Hi, Jesse. If you're saving all of the registers, then there's probably no
- harm in using PUSHA/POPA, unless there are some hidden side effects that I'm
- not aware of. However, there are registers that you don't need to save, even
- if you're using them. AX is one, since the compiler expects it to be used for
- return values. Also, SP isn't really restored, since it's value is discarded
- by the POPA instruction, not copied back into the register. So there's 2 that
- the instruction isn't needed for. That leaves 6. A PUSHA takes 18 clocks on
- the 386, while PUSH only requires 2. POPA takes 24 clocks on the 386, and POP
- takes 4. So you're wasting 6 clocks on the PUSHA, but the POPA works out
- even. If the function is one that is called in a tight loop, say 10,000
- times, then you're blowing off 60,000 clocks <g>.
-
- _______________________ Subj: 386 Instruction Timing _______________________
-
- Fm: KGliner 70363,3672 # 318190
- To: all Date: 21-Mar-93 22:53:29
-
- A simple asm question for you all:
-
- On a 386, do these instructions take the same amount of time or is one
- faster than the other:
-
- mov [si],al
- mov [si + 512],al
- ...........................................................................
-
- Fm: Mike W. Smith 75300,3434 # 318283
- To: KGliner 70363,3672 (X) Date: 22-Mar-93 01:59:24
-
- KG>On a 386, do these instructions take the same amount of time or is one
- KG>faster than the other:
-
- KG> mov [si],al KG> mov [si + 512],al
-
- On a 386, a "MOV mem,reg" is 2 clock cycles for any effective address.
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 318977
- To: Mike W. Smith 75300,3434 (X) Date: 23-Mar-93 09:20:46
-
- Except when an offset is used and that takes 1 cycle per byte of offset. If
- the offset is BYTE, one cycle. If the offset is WORD, two cycles.
-
- At least that's what Turbo Profiler says when I run the test.
-
- Randy
- ...........................................................................
-
- Fm: Mike W. Smith 75300,3434 # 319385
- To: Randy @ Safari 71165,3600 Date: 23-Mar-93 23:50:43
-
- RS>Except when an offset is used and that takes 1 cycle per byte of offset.
- RS>If the offset is BYTE, one cycle. If the offset is WORD, two cycles.
-
- That's different from what my tech reference says. A MOV reg,mem takes the
- same time whether it's a byte or word. For 8086/88 processors the effective
- address adds anywhere from 5 to 14 clock cycles to an instruction. For
- 286/386 processors, the only case where an extra clock is added is when all
- three indexing elements are used (base, index, and displacement).
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 320207
- To: Mike W. Smith 75300,3434 (X) Date: 25-Mar-93 09:22:21
-
- ->That's different from what my tech reference says. A MOV reg,mem ->takes
- the same time whether it's a byte or word
-
- That's true, except when you use an offset like [si+512] which causes and ADD
- to be performed prior to the fetch. My timings are as such
-
- 10,000 iterations (includes loop time)
- mov al,[si] .0047
- mov al,[si+512] .0051/.0052 (fluctuated)
-
- That was the original question, right?
-
- _____________________________ Subj: 32 Bit Code _____________________________
-
- Fm: Sarwan Narine 76675,164 # 319332
- To: All Date: 23-Mar-93 22:17:49
-
- Consider the following instructions:
- #1. MOV AL, DS:[SI]
- #2. MOV AL, DS:[ESI]
-
- These instructions can be used to accomplish the same task. However, under
- certain circumstances instruction #2 will fail. If SMARTDRV is _not_ loaded
- then instruction #2 causes a hang-up, however, a CTRL-ALT-DEL will reset my
- system. What does SMARTDRV do to enable instruction #2 to execute? BTW, my
- program is written for 32-bit CPUs only. Thanks for any insight.
- ...........................................................................
-
- Fm: Jaimi McEntire 71700,1202 # 319345
- To: Sarwan Narine 76675,164 Date: 23-Mar-93 22:38:05
-
- Sarwan, if your code before that loaded si (because you wanted a word), the
- esi register could have trash in the upper 16 bits. in that case, you would
- definitely need to either clear out esi (mov esi,0 ) before loading it, or
- you would need to extend it as you moved it (cwde). Just as a side note, you
- can of course use any 32 bit register as an index on the 386, if you did not
- know that. also, you can use FS and GS too. all you need to do (if you are
- using borland c) is compile by assembly. P.S. Smartdrv probably enables #2 to
- execute because it clears the registers, because it too has 32 bit code.
-
- Jaimi
- ...........................................................................
-
- Fm: Bruce Nehlsen 76535,2466 # 319363
- To: Jaimi McEntire 71700,1202 (X) Date: 23-Mar-93 23:00:26
-
- Sarwan -
-
- Another comment, since I had the same problem, except my code would CRASH if
- and only if EMM386 was loaded.
-
- Anyway, in my case it turned out to be my assembly directives. In some
- modules I used <.code, .data> , and in some I used < DATA SEG ">. In one of
- those 2 methods, the assembler was inserting ENTER and LEAVEs, which made a
- big mess, since I already had those in there.
-
- Bottom line - check the .LST file, and ensure that what you WROTE is what the
- assembler generated.
-
- Later...
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 319394
- To: Sarwan Narine 76675,164 Date: 24-Mar-93 00:05:40
-
- Another thing I'd like to add to what the others have said is that you can't
- use a value greater than 65535 in ESI if you have not changed the segment
- limit stuff on the computer. The 386 (or higher) computer starts up with the
- segment limits set to be 65535. If you can, always debug 386-specific code
- with a 386-specific debugger.. it'll allow you to see things other debuggers
- won't (and also capture exceptions --one of which is activated by using
- invalid segment limits).
- _Dan
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 319438
- To: Dan Corritore 70243,1110 (X) Date: 24-Mar-93 02:53:58
-
- re: segment limits, &c...
- By my understanding, most machines running under DOS these days
- are actually running in a virtual 86 (managed by emm386 or similar)
- most of the time. In v86 mode, as I understand it, the segment limit
- is always 0xffff. Therefore, without switching to protected mode
- (via VCPI/DPMI/whatever), there shouldn't be any advantage to using
- a dword subscript (such as [esi]).
- Anybody care to confirm/refute this ?
-
- - Rod
- ...........................................................................
-
- Fm: Rob Nicholson (HMS Ltd) 100060,154 # 319468
- To: rod lentz 71163,57 (X) Date: 24-Mar-93 05:40:20
-
- There appears to be a 'fudge' that allows the segments to be >65535 in real
- mode. One of the memory managers or disk caches (can't remember which) left
- the segments unbounded.
-
- Rob.
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 319725
- To: rod lentz 71163,57 (X) Date: 24-Mar-93 15:57:14
-
- You are right. There is no advantage at all to using ESI over SI
- without going into protected mode and switching the segment limit stuff
- yourself (or having one of those DOS extenders, I believe). If you have to do
- it yourself, get a 386 or 486 specific book which deals with that kind of
- stuff(or both).. I'm planning on doing so one day when I feel the need for
- stuff like that. Well, anyway, that stuff doesn't stop you from using the
- 32-bit registers and 32-bit instructions, though, so play with them all you
- want!<g>
- _Dan
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 319954
- To: Dan Corritore 70243,1110 Date: 24-Mar-93 21:22:54
-
- Rob - I've heard about the glitch/feature that allows segments
- > 64k in real mode, but like I said, most PC's are spending most of
- their time in v86 mode these days, where I don't believe it works.
- And unfortunately, the predominant protected mode spec in use is
- VCPI; has anybody else tried sifting through that one ? Not the
- easiest spec I've seen...
- Dan - I have been using the 32 bit reg's for math & stuf; I was
- just hoping somebody knew of a way to do 32-bit addressing from
- inside v86 mode. Segmented far pointers are a big clock-killer in
- most of my apps.
-
- - Rod
- ...........................................................................
-
- Fm: Jaimi McEntire 71700,1202 # 321511
- To: Dan Corritore 70243,1110 Date: 27-Mar-93 10:42:03
-
- Oh, one other thing - you need to ignore the segment registers in flat model.
- instead of using ES:DI, you would just use EDI. (or any other index or
- general register for that matter).
-
- Jaimi
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 320768
- To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 04:42:26
-
- Nope, (real mode != v86) ! Very similar, but not the same.
- In v86 mode, a "supervising" program is needed to handle details of
- the virtualization (virtual to physical memory mapping, &c.).
- Also, from v86 mode you can't take advantage of the glitch/feature
- of the 386/486's, where you can load the segment limits with large
- (4 gig !) values in protected mode, switch back to real mode, and
- then access huge segments from real mode.
-
- - Rod
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 321275
- To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:53:07
-
- Well, for one...
-
- Movs to and from memory in Protected mode can take as much as 18
- machine cycles as compared to 2 to 5 for a real mode 386 or 486 respectively.
- This is your MAJOR slowdown.
-
- I can't quote other speeds cause I've left my docs at home. But I am sure
- some of the other memory intensive functions like AND/OR/XOR have the same
- problem.
-
- Randy
- Safari
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 321274
- To: Serge Mathieu 71035,2771 (X) Date: 26-Mar-93 22:52:45
-
- Ok, here goes..
-
- in REAL mode, you have this...
-
- +---------------------------+ 0K
- | |
- ~ ~
- | |
- +---------------------------+ 640k
- | |
- | |
- | |
- | |
- +---------------------------+ Top of memory (max for machine)
-
- the first part is what you can address directly from your program. The
- second part MUST be addressed through a memory manager and is EXTREMELY slow.
-
- In REAL FLAT MODE, you have the same thing, but (a) a FLAT MODEL HEAP MANAGER
- allocates far memory quickly in blocks much larger than the page frame
- (usually 64k) of EMS or XMS, and (b) you can access all of the allocated
- memory with one instruction.
-
- for example, in real mode, to access memory WITHIN the 640k boundary you must
- do this (or the same thing some other way<g>)
-
- asm les di, dword ptr [some_ptr] ; some_ptr is the FAR address
- asm mov al, byte ptr es:[di] ; this eats cycles cause of the
- ; ES segment override.
-
- in REAL FLAT MODE, you do this
-
- asm mov ebx, dword [some_ptr] ; loads all 32 bits into EBX
- ; this is also faster
- than LES DI
- asm mov al,byte [EBX] ; 5 cycles maximum
-
- in proteced mode, you do the same as in REAL FLAT MODE but it takes longer
- because the processor is handling many tasks (internal and external), as well
- as watching for segment overruns, at one time.
-
- I'll post more tomorrow.
-
- Randy
- Safari
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 321380
- To: Randy @ Safari 71165,3600 Date: 27-Mar-93 04:58:29
-
- Randy -
- Now, by "real flat mode", I assume you're using the trick of
- loading up large selectors in protected mode, then switching back
- to real mode (i.e., the "seg4g" trick) ? Which, as I understand,
- doesn't work in v86 mode, i.e. anytime emm386 or similar managers
- are running ?
- Also, about your statements re:speed in different modes -
- to the best of my understanding, all modes operate fairly equally.
- The big killers are calling through task gates, and switching to/from
- protected mode (which of course is needed for DOS calls, handling
- real-mode interrupts, &c.). As far as the processor watching for
- segment overruns, I believe that's done in all modes; it's just
- a lot more likely to cause an exception in protected mode. Am I
- mistaken ?
-
- - Rod
- ...........................................................................
-
- Fm: Mark Betz/Ass't SysOp 76605,2346 # 320453
- To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 18:53:58
-
- I don't know that EMS is too slow. You can get 64k pages at a time, so it's
- effectively like having a number of segments on tap. You just have to switch
- them into the page frame. I'm not very expert on this topic, so I'd best
- leave specific performance details to others. If Eric Pinnel is lurking
- he'll tell you that the trend is towards 32-bit flat memory model, and I
- think he'd be right.
-
- --Mark
- ...........................................................................
-
- Fm: Rob Nicholson (HMS Ltd) 100060,154 # 320760
- To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 26-Mar-93 03:43:29
-
- AFAIK, EMS (expanded) is faster than XMS (extended) when used from real mode.
- With EMS, 64k can be banked into memory almost instantly. With XMS, you keep
- having to copy chunks of memory backwards and forwards between extended and
- conventional memory.
-
- Rob.
- ...........................................................................
-
- Fm: Mark Betz/Ass't SysOp 76605,2346 # 321090
- To: Rob Nicholson (HMS Ltd) 100060,154 Date: 26-Mar-93 18:23:02
-
- Hi, Rob. Wouldn't you also have to back-shuttle the EMS page if it changed? I
- suppose you could use it for read-only stuff.
-
- --Mark
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 320656
- To: Serge Mathieu 71035,2771 (X) Date: 25-Mar-93 22:31:56
-
- Yeah, I'd opt for flat mode anyday, but EMS and XMS aren't all that bad to
- use. I used EMS very briefly a while ago, but it's speed wasn't that slow.
- Anyway, you always access it through a certain frame of memory (usually
- D000-DFFF). I used it to just add a spare 64K of memory to my program (to use
- just like anything else), but that's not really a good use for it at all..
- XMS and the others I really don't know much about, but I'll probably run into
- them soon or a later... actually, my PC INTERN book looks like it has a few
- good sections on the various ways of accessing extended memory . Oh well,
- that's all I could do..
- _Dan
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 321652
- To: rod lentz 71163,57 (X) Date: 27-Mar-93 14:51:54
-
- No. The loading of the "large selectors" or the top 16 bits of the 32 bit
- registers is not a function of protected mode. Nor is it slower or equally as
- fast.
-
- Real mode 32 bit moves take 2 to 5 cycles depending on the CPU, 2 for 486, up
- to 5 on a 386.
-
- In protected mode, you have bounds checking that occurs inside the processor
- that takes extra cycles thus causing the incredible lag in MOV times. No
- state switching occurs except in the startup where the procesor is told to
- ignore segment boundary violations.
-
- Randy
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 321737
- To: Randy @ Safari 71165,3600 (X) Date: 27-Mar-93 17:23:20
-
- So, does real flat mode work with emm386 (or similar) loaded ?
- And, if so, do you have any sample code you're willing to share
- of how to set it up ?
- Also, what tools are you developing with then ? I assume mostly
- assembler, to get the addressing modes you need.
-
- As far as the bounds checking & other penalties in protected
- mode - is this mentioned in the Intel doc's ? I don't remember seeing
- that mentioned.
- And state switching should be needed when running protected mode
- under DOS, to handle hardware interrupts, DOS i/o, and interfacing
- with all that other real mode code sitting underneath the protected
- app.
-
- - Rod
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 321862
- To: rod lentz 71163,57 (X) Date: 27-Mar-93 21:16:29
-
- No. State switching is not needed.
-
- No memory manager, except HIMEM can be loaded as they all put the system in
- V86 mode.
-
- The tool is called BCCX32 and is put out by Network Systems Design.
- (414) 231-3333 out of Oshkosh (b'gosh<g>), Wisconsin.
- The guys' name is Jim Dempsey and after looking at his sample code, it looks
- pretty good.
-
- BCCX32 is a postprocessor that takes your BCC/TCC generated ASM output from
- the compiler and strokes it, optimizes it, and re-generates 32-bit flat model
- code that will run as it sits.
-
- I know it sounds bizarre but it works.
-
- Randy Safari
- ...........................................................................
-
- Fm: rod lentz 71163,57 # 322004
- To: Randy @ Safari 71165,3600 (X) Date: 28-Mar-93 00:59:40
-
- Sounds like an interesting tool. I like the idea of the code
- post-processor, so you can still use your compiler; nifty !
- However, the (expected) memory manager conflict bothers me.
- In my experience, having the user reconfigure/reboot/&c. is the
- type of thing that causes many gripes. For some of the "turnkey"
- systems I work on, it's still a possibility, but for anything aimed
- at more general release, I shudder. What's your experience in
- dealing with that ?
-
- - Rod
- ...........................................................................
-
- Fm: Randy @ Safari 71165,3600 # 322180
- To: rod lentz 71163,57 (X) Date: 28-Mar-93 12:30:24
-
- ->experience in dealing with that.
-
- None yet. EPIC's doing it with ZONE 66 and I REALLY don't like the
- idea but as FLAT model packages become more and more the norm, people
- WILL get used to it.
-
- Randy
- ...........................................................................
-
- Fm: Rob Nicholson (HMS Ltd) 100060,154 # 322060
- To: Mark Betz/Ass't SysOp 76605,2346 (X) Date: 28-Mar-93 06:19:29
-
- As most of our use for EMS is for storing bit-maps, I suppose it's read-only
- and it works quite well. XMS copying is much slower for this purpose.
-
- Rob.
-
- _______________________ Subj: Boolean Sprite Masking _______________________
-
- Fm: TIM 76247,1130 # 343432
- To: ALL Date: 28-Apr-93 14:01:02
-
- Here's one for the blitheads out there... <grin> I want to merge two
- sprites, held in character arrays. (Let's call them A and B -- both "unsigned
- char".) An individual value of '0' is equivalent to transparency.
- Here's the question: is there a set of strictly Boolean operators that will
- let me merge these two arrays? In other words,
- C = (A & mask) | B,
- where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else.
- It seems to me that there is no Boolean way to create 'mask' (which looks
- like the output of a stepping function), but maybe I'm just not thinking
- clearly enough today.
- Loops are EVIL. <grin> Is there a better way?
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 343771
- To: TIM 76247,1130 (X) Date: 28-Apr-93 22:15:39
-
- I can't think of any logical operations, but this is what I'd do (in C, at
- least):
-
- C[n]= B[n] ? B[n] : A[n];
-
- Sorry.. I can't think of any better way then to test for 0!
-
- Now, in assembly, there is a neat trick which can be performed on 486+
- processors, which doesn't require a jump. Here it is:
-
- mov ah,B[n] ; pseudo-code -ish (implement how you choose)
-
- mov al,0 ; the 'tester'
-
- mov bh,A[n] ; again, pseudo-code -ish
-
- ;Now, here's the tricky part:
-
- cmpxchg ah,bh ; now, ah will equal the correct value
-
- mov C[n],ah ; pseudo-code ish..
-
- There you go! Don't understand? Well, here's how it goes.. the
- 'cmpxchg ah,bh' instruction boils down to this:
-
- if (AL==AH) AH=BH; // remember, AL==0
- else AL=BH; // this part we don't care about..
-
- // (what we do care about is that it didn't change AH)
-
- Which equals, using the above code,
-
- if (B[n]==0) C[n]=A[n];
- else C[n]=B[n];
-
- Do you understand?
-
- _Dan
-
- P.S. Thanks.. I needed to use my brain today! <g>
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 343849
- To: TIM 76247,1130 (X) Date: 28-Apr-93 23:39:32
-
- >> where 'mask' is 0xFF everywhere B[?] = 0x00, and 0x00 everywhere else. It
- seems to me that there is no Boolean way to create 'mask'
-
- Dan's way is correct IMO, but since you ask about how to make the mask:
-
- movzx ax,byte ptr B[?] ;AL = pixel, AH = 0
- dec ax ;AH = 0xFF if B[?] was zero, else 0x00
- inc al ;restore AL=pixel
-
- Peter.
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 344166
- To: Hans Peter Rushworth 100031,473 (X) Date: 29-Apr-93 13:14:17
-
- That's a neat technique for creating the 'mask'. I guess I should've listened
- to what he was asking more closely. (instead of giving him a way to do it
- without the mask). There's so many tricks you can do in Assembly language,
- which is why we love to program in it, yes?<g>
- _Dan
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 344189
- To: Dan Corritore 70243,1110 (X) Date: 29-Apr-93 13:44:03
-
- >> so many tricks you can do in Assembly language, which is why we love to
- program in it, yes?<g>
-
- Absolutely!
-
- I still think your C = B ? B : A; or if(!(C=B)) C=A;
-
- is the proper method. But I thought the dec trick was interesting.
-
- Peter.
- ...........................................................................
-
- Fm: John Dlugosz [ViewPoint] 70007,4657 # 343887
- To: TIM 76247,1130 (X) Date: 29-Apr-93 00:17:57
-
- re loops: You need a loop to process the thing anyway. Step through C, A,
- and B at the same time, processing one byte. (I assume you have 1 byte per
- pixel, packed pixel format)
-
- So, create the mask from B just for that byte, when needed, as part of the
- main loop.
-
- For a hint, look at the way the compiler generates code for the prefix !
- operator. It involves no jumps.
-
- However, since you need a jump _anyway_ to get back to the top of the loop,
- you can double the loop and have the test on B being zero branch to two
- different parts of code which OR's in B and advances or just advances, and
- put this _before_ the test, so you still only have exactly one jump per
- iteration.
-
- --John
- ...........................................................................
-
- Fm: Mark 'SAM' Baker 100025,444 # 344146
- To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 12:48:49
-
- All these complications.
-
- Surely the fastest and most efficient (in terms of code size) method is :-
-
- mov al,b[n]
- jnz passover
- mov al,a[n]
- :passover
- mov c[n],al
- < this gives you the resultant pixel in c[n], no need to mask or anything >
-
- Mark
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 344190
- To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:14
-
- Mark,
-
- > mov al,b[n] > jnz passover
-
- One small point: unlike nice Motorola processors, the mov instruction does
- not effect any flags, so you would need a compare of some sort.
-
- Tim's request included a "no branches" constraint, that's why we are being
- devious in our ways. <g>
-
- Peter.
- ...........................................................................
-
- Fm: John Dlugosz [ViewPoint] 70007,4657 # 344361
- To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 18:39*32
-
- <<Surely the fastest and most efficient (in terms of code size) method is>>
-
- Smallest code, but not the fastest! jumps are _expensive_. We go to great
- lengths to avoid them. Figure 7 clocks plus pipeline delays for your "jnz
- passover". That is half the time it takes to multiply, or longer than all
- the rest of the instructions in that loop combined.
- ...........................................................................
-
- Fm: Dan Corritore 70243,1110 # 344165
- To: John Dlugosz [ViewPoint] 70007,4657 (X) Date: 29-Apr-93 13:14:12
-
- Yeah.. I forgot about the ! operator. It should be easy to create a mask
- doing that.. as such:
-
- mask= -!B[?]; // do a 'not' and then negate it
-
- This way, it will be either -1 (0xff) for a zero value or 0 for a non-zero
- value.
-
- _Dan
- ...........................................................................
-
- Fm: Mark 'SAM' Baker 100025,444 # 344151
- To: TIM 76247,1130 (X) Date: 29-Apr-93 12:54:07
-
- Why do you need to bother with all this masking.
-
- In pseudo-assembler, try this :-
-
- mov al,b[n]
- jnz not_b
- mov al,a[n]
- not_b:
- mov c[n],al
-
- This gives you the correct result, without any recourse to booleans.
- I think it is probably also the fastest, and the most efficient in code size
- that you will find.
-
- OK, a purist wouldn't like the jump (it is a GOTO by any other name), but you
- can't avoid them in assembler.
-
- procedure BIT_SET;
- begin
- C[n]:=B[n];
- if (C[n] = 0) then C[n]:=A[n];
- end;
-
- Mark
- ...........................................................................
-
- Fm: Hans Peter Rushworth 100031,473 # 344191
- To: Mark 'SAM' Baker 100025,444 (X) Date: 29-Apr-93 13:44:20
-
- Sorry for the repeat:
-
- >> mov al,b[n] >> jnz not_b
-
- You must insert a cmp al,0 or "or al,al" to correctly set the zero flag. The
- mov instruction does not effect it.
-
- Peter.
- ...........................................................................
-
-
-